Morpho Challenge 2005-2010: Evaluations and Results
نویسندگان
چکیده
Morpho Challenge is an annual evaluation campaign for unsupervised morpheme analysis. In morpheme analysis, words are segmented into smaller meaningful units. This is an essential part in processing complex word forms in many large-scale natural language processing applications, such as speech recognition, information retrieval, and machine translation. The discovery of morphemes is particularly important for morphologically rich languages where inflection, derivation and composition can produce a huge amount of different word forms. Morpho Challenge aims at language-independent unsupervised learning algorithms that can discover useful morpheme-like units from raw text material. In this paper we define the challenge, review proposed algorithms, evaluations and results so far, and point out the questions that are still open.
منابع مشابه
Morpho Challenge competition 2005-2010: Evaluations and results
Morpho Challenge is an annual evaluation campaign for unsupervised morpheme analysis. In morpheme analysis, words are segmented into smaller meaningful units. This is an essential part in processing complex word forms in many large-scale natural language processing applications, such as speech recognition, information retrieval, and machine translation. The discovery of morphemes is particularl...
متن کاملUnsupervised Morpheme Analysis Evaluation by a Comparison to a Linguistic Gold Standard - Morpho Challenge 2008
The goal of Morpho Challenge 2008 was to find and evaluate unsupervised algorithms that provide morpheme analyses for words in different languages. Especially in morphologically complex languages, such as Finnish, Turkish and Arabic, morpheme analysis is important for lexical modeling of words in speech recognition, information retrieval and machine translation. The evaluation in Morpho Challen...
متن کاملProceedings of the Morpho Challenge 2010 Workshop
In natural language processing many practical tasks, such as speech recognition, information retrieval and machine translation depend on a large vocabulary and statistical language models. For morphologically rich languages, such as Finnish and Turkish, the construction of a vocabulary and language models that have a sufficient coverage is particularly difficult, because of the huge amount of d...
متن کاملEvaluating an Agglutinative Segmentation Model for ParaMor
This paper describes and evaluates a modification to the segmentation model used in the unsupervised morphology induction system, ParaMor. Our improved segmentation model permits multiple morpheme boundaries in a single word. To prepare ParaMor to effectively apply the new agglutinative segmentation model, two heuristics improve ParaMor’s precision. These precision-enhancing heuristics are adap...
متن کاملUnsupervised Morpheme Analysis Evaluation by IR experiments - Morpho Challenge 2007
This paper presents the evaluation of Morpho Challenge Competition 2 (information retrieval). The Competition 1 (linguistic gold standard) is described in a companion paper. In Morpho Challenge 2007, the objective was to design statistical machine learning algorithms that discover which morphemes (smallest individually meaningful units of language) words consist of. Ideally, these are basic voc...
متن کامل